Welcome to FLYDATA, where we transform aviation data into actionable insights. This demonstration showcases our analytical prowess with an exploration of the nycflights23 dataset, capturing New York City flights in 2013. Our capabilities allow us to tackle questions such as:
At FLYDATA, we specialize in transforming raw aviation data into actionable insights that empower our customers to optimize their operations and enhance customer satisfaction. By providing critical statistics such as the average, median, and standard deviation for departure delay, arrival delay, and air time, we enable our clients to gain a comprehensive understanding of their flight punctuality and operational efficiency.
# Summarize the mean, median, and standard deviation for departure delay, arrival delay, and air time
# Use what you learned about the six verbes and display your summary as a table :)
flights_summary <- flights |>
reframe(
averages = c(mean(dep_delay, na.rm = TRUE),
mean(arr_delay, na.rm = TRUE),
mean(air_time, na.rm = TRUE)),
median = c(median(dep_delay, na.rm = TRUE),
median(arr_delay, na.rm = TRUE),
median(air_time, na.rm = TRUE)),
standard_deviation = c(sd(dep_delay, na.rm = TRUE),
sd(arr_delay, na.rm = TRUE),
sd(air_time, na.rm = TRUE))
) |> mutate(variable = c("Departure Delay","Arrival Delay","Airtime"))
# Print the summary
#print(flights_summary)
# Print the summary (in a nicer looking way)
kable(flights_summary, "html") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
| averages | median | standard_deviation | variable |
|---|---|---|---|
| 13.837372 | -2 | 54.31385 | Departure Delay |
| 4.344803 | -10 | 57.86889 | Arrival Delay |
| 141.820258 | 121 | 89.17256 | Airtime |
Knowing the total flight count for each airline provides our clients with crucial information to assess market share, identify competitive strengths, and spot potential opportunities for collaboration or strategic alliances. This data allows airlines to benchmark themselves against competitors, evaluate their fleet utilization, and optimize route planning.
flights_by_airline <- flights |>
group_by(carrier) |>
summarize(number_of_flights = n()) |>
arrange(desc(number_of_flights))
# Join with airlines for full name and print
flights_by_airline <- flights_by_airline |>
left_join(airlines, by = "carrier")
# Print the summary
# print(flights_by_airline)
# Print the summary (in a nicer looking way)
kable(flights_by_airline, "html") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
| carrier | number_of_flights | name |
|---|---|---|
| YX | 88785 | Republic Airline |
| UA | 79641 | United Air Lines Inc. |
| B6 | 66169 | JetBlue Airways |
| DL | 61562 | Delta Air Lines Inc. |
| 9E | 54141 | Endeavor Air Inc. |
| AA | 40525 | American Airlines Inc. |
| NK | 15189 | Spirit Air Lines |
| WN | 12385 | Southwest Airlines Co. |
| AS | 7843 | Alaska Airlines Inc. |
| OO | 6432 | SkyWest Airlines Inc. |
| F9 | 1286 | Frontier Airlines Inc. |
| G4 | 671 | Allegiant Air |
| HA | 366 | Hawaiian Airlines Inc. |
| MQ | 357 | Envoy Air |
We harness the power of data analytics to uncover seasonal trends that are pivotal for strategic planning and operational efficiency in the aviation industry. Analyzing the number of flights per month allows us to identify seasonal patterns that have significant implications for airlines and airports alike.
flights_by_month <- flights |>
mutate(
month = case_when(
month == 1 ~ "Januar",
month == 2 ~ "Februar",
month == 3 ~ "März",
month == 4 ~ "April",
month == 5 ~ "Mai",
month == 6 ~ "Juni",
month == 7 ~ "Juli",
month == 8 ~ "August",
month == 9 ~ "September",
month == 10 ~ "Oktober",
month == 11 ~ "November",
month == 12 ~ "Dezember",
TRUE ~ as.character(month) # to handle unexpected values, though ideally shouldn't occur
),
month = factor(month, levels = c("Januar", "Februar", "März", "April", "Mai", "Juni",
"Juli", "August", "September", "Oktober", "November", "Dezember"))
) |>
group_by(month) |>
summarize(number_of_flights = n())
# Plotting flights per month
flights_by_month_plot <- flights_by_month |>
ggplot(aes(x = month, y = number_of_flights)) +
geom_col(fill = "#272727") +
labs(title = "Number of Flights per Month", x = "Month", y = "Number of Flights") +
theme_minimal()
ggplotly(flights_by_month_plot)
By providing insights into these patterns, we help airlines and airports optimize their schedules, enhance ground operations, and minimize delays. This data-driven approach not only enhances operational effectiveness but also significantly improves passenger experience by reducing wait times and maintaining reliable schedules.
delay_by_hour <- flights |>
group_by(hour) |>
summarize(avg_dep_delay = mean(dep_delay, na.rm = TRUE),
avg_arr_delay = mean(arr_delay, na.rm = TRUE))
# Plotting average departure delay per hour
delay_by_hour |>
ggplot() +
geom_line(aes(x = hour, y = avg_dep_delay, color = "Departure"), linewidth = 1) +
geom_line(aes(x = hour, y = avg_arr_delay, color = "Arrival"), linewidth = 1) +
labs(
title = "Average Delay by Hour",
x = "Hour of Day",
y = "Average Delay (minutes)",
color = "Type"
) +
theme_minimal() -> delay_by_hour_plot
ggplotly(delay_by_hour_plot)
By offering a detailed analysis of the air time and distance relationship, we enable airlines to optimize route planning, improve flight scheduling, and enhance overall operational performance, ensuring a balance between efficiency, cost, and customer satisfaction.
air_time_distance_plot <- flights |>
ggplot(aes(x = distance, y = air_time)) +
geom_point(alpha = 0.3, color = "blue") +
geom_smooth(method = "lm", col = "red") +
labs(title = "Air Time vs. Distance", x = "Distance (miles)", y = "Air Time (minutes)") +
theme_light()
air_time_distance_plot
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 12534 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 12534 rows containing missing values or values outside the scale range
## (`geom_point()`).
FLYDATA begins by identifying the busiest routes. Insights into high-traffic corridors can drive strategic planning for airlines and airports.
busiest_routes <- flights |>
group_by(origin, dest) |>
summarize(number_of_flights = n()) |>
arrange(desc(number_of_flights)) |>
ungroup()
## `summarise()` has grouped output by 'origin'. You can override using the
## `.groups` argument.
knitr::kable(head(busiest_routes, 10), caption = "Top 10 Busiest Routes from NYC")
| origin | dest | number_of_flights |
|---|---|---|
| JFK | LAX | 10045 |
| LGA | ORD | 9923 |
| LGA | BOS | 8217 |
| LGA | ATL | 7883 |
| JFK | SFO | 7440 |
| EWR | MCO | 7262 |
| JFK | BOS | 6432 |
| LGA | DFW | 5972 |
| JFK | MIA | 5930 |
| EWR | ATL | 5915 |
busiest_routes |>
top_n(10, number_of_flights) |>
ggplot(aes(x = reorder(paste(origin, dest, sep = " - "), number_of_flights), y = number_of_flights)) +
geom_bar(stat = "identity", fill = "lightgreen") +
coord_flip() +
labs(title = "Top 10 Busiest Routes from NYC", x = "Route", y = "Number of Flights") +
theme_minimal() -> busiest_routes_plot
ggplotly(busiest_routes_plot)
Our exploration identifies key routes to major airports such as LAX, ORD, and ATL. This information aids in optimizing fleet allocation and scheduling.
Analyzing the influence of weather, especially at JFK, FLYDATA assesses its impact on delays to bolster operational resilience.
jfk_weather_delay <- flights |>
filter(origin == "JFK") |>
left_join(weather, by = c("origin", "year", "month", "day", "hour")) |>
group_by(date = as.Date(paste(year, month, day, sep = "-"))) |>
summarize(avg_dep_delay = mean(dep_delay, na.rm = TRUE),
avg_wind_speed = mean(wind_speed, na.rm = TRUE))
knitr::kable(head(jfk_weather_delay), caption = "Sample Data of JFK Wind Speed Impact on Delays")
| date | avg_dep_delay | avg_wind_speed |
|---|---|---|
| 2023-01-01 | 18.764045 | 10.874225 |
| 2023-01-02 | 45.703833 | 7.068520 |
| 2023-01-03 | 38.898649 | 4.901329 |
| 2023-01-04 | 32.215488 | 5.684622 |
| 2023-01-05 | 11.787879 | 6.163238 |
| 2023-01-06 | 7.862876 | 5.199678 |
jfk_weather_delay |>
ggplot(aes(x = avg_wind_speed, y = avg_dep_delay)) +
geom_point(alpha = 0.5, color = "orange") +
geom_smooth(method = "lm", col = "darkgreen") +
labs(title = "Impact of Wind Speed on Departure Delay at JFK", x = "Average Wind Speed (miles/hour)", y = "Average Departure Delay (minutes)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
Identifying a positive correlation, we recommend leveraging weather data to anticipate delays and refine scheduling protocols.
FLYDATA investigates whether certain days endure more delays, thus suggesting staffing and scheduling efficiency opportunities.
flights_dayofweek <- flights |>
mutate(weekday = weekdays(as.Date(paste(year, month, day, sep = "-")))) |>
group_by(weekday) |>
summarize(avg_dep_delay = mean(dep_delay, na.rm = TRUE)) |>
arrange(match(weekday, c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")))
knitr::kable(flights_dayofweek, caption = "Average Departure Delay by Day of the Week")
| weekday | avg_dep_delay |
|---|---|
| Monday | 14.73471 |
| Tuesday | 10.85576 |
| Wednesday | 10.60587 |
| Thursday | 11.90029 |
| Friday | 16.58209 |
| Saturday | 15.84804 |
| Sunday | 16.99358 |
flights_dayofweek |>
ggplot(aes(x = weekday, y = avg_dep_delay)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
labs(title = "Average Departure Delay by Day of the Week", x = "Day of Week", y = "Average Departure Delay (minutes)") +
theme_minimal()
Delays are prevalent on weekends, suggesting a strategic focus on weekends for staffing and resource allocation.
Our analysis examines the relationship between aircraft age and usage, providing insights into fleet management strategies.
plane_ages <- flights |>
left_join(planes, by = "tailnum") |>
mutate(plane_age = 2023 - year.y) |>
group_by(plane_age) |>
summarize(number_of_flights = n())
knitr::kable(filter(plane_ages, !is.na(plane_age)), caption = "Flights by Aircraft Age in 2013")
| plane_age | number_of_flights |
|---|---|
| 0 | 9004 |
| 1 | 15393 |
| 2 | 10007 |
| 3 | 9323 |
| 4 | 14729 |
| 5 | 11362 |
| 6 | 20087 |
| 7 | 13653 |
| 8 | 19259 |
| 9 | 32598 |
| 10 | 24685 |
| 11 | 7089 |
| 12 | 5109 |
| 13 | 7061 |
| 14 | 11313 |
| 15 | 44442 |
| 16 | 30558 |
| 17 | 16239 |
| 18 | 15511 |
| 19 | 9622 |
| 20 | 5580 |
| 21 | 9691 |
| 22 | 15163 |
| 23 | 16022 |
| 24 | 15825 |
| 25 | 11340 |
| 26 | 3733 |
| 27 | 1801 |
| 28 | 1704 |
| 29 | 3494 |
| 30 | 993 |
| 31 | 1455 |
| 32 | 1243 |
| 33 | 1204 |
plane_ages |>
filter(!is.na(plane_age)) |>
ggplot(aes(x = plane_age, y = number_of_flights)) +
geom_col(fill = "coral") +
labs(title = "Flights by Aircraft Age in 2013", x = "Aircraft Age (years)", y = "Number of Flights") +
theme_minimal()
Our findings reveal diverse aircraft usage across various ages, informing maintenance and fleet strategies to enhance safety and cost-efficiency.
Customer satisfaction is a multidimensional aspect that’s pivotal for the sustained success and reputation of airlines and airports. At FLYDATA, we analyze various factors influencing customer satisfaction to provide our clients with insights for enhancing passenger experiences and ensuring customer loyalty. Here’s a look at some key insights into customer satisfaction:
This demonstration from FLYDATA illustrates how our data-driven insights can shape operational excellence in the aviation industry. From understanding route dynamics to optimizing aircraft usage, FLYDATA empowers stakeholders to make informed decisions. For bespoke analysis and deeper dives into your specific needs, connect with our team to explore customized solutions.